“Provide background on your data sets and a clear formulated question or hypothesis.”
For this project, my question of Interest is: “How does crime rate relate to income in Canada?”
In order to answer this question, data on both crime and socioeconomic status are needed. However, I found no existing data set that contains all desired information, therefore this needs to be achieved through merging more than one data sets. Aftering choosing carefully, the following two separate data sets are obtained:
“Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas”. Released 2023-05-02. This data set is annually updated and maintained by Statistics Canada (Table 11-10-0239-01). Data is collected through the Survey of Labor and Income Dynamics, Survey of Consumer Finances, and Canadian Income Survey.
“Incident-based crime statistics, by detailed violations, Canada, provinces, territories, Census Metropolitan Areas and Canadian Forces Military Police”. Released 2023-07-27. This data set is also annually updated and maintained by Statistics Canada (Table 35-10-0177-01, formerly CANSIM 252-0051). Data is collected through the Uniform Crime Reporting Survey.
Understanding the relationship between crime rates and income in Canada is crucial for policymakers, law enforcement agencies, and social welfare programs. Exploring this correlation can shed light on the socioeconomic factors driving criminal behavior and help formulate targeted interventions to alleviate poverty and reduce crime. Additionally, elucidating this connection can inform broader discussions on social inequality, justice, and community well-being in Canadian society.
“Include how and where the data were acquired, how you cleaned and wrangled the data, what tools you used for data exploration.”
Both data sets are downloaded directly from Statistics
Canada, which is usually considered to be an reliable source.
Because they share the same source, the data sets follows similar
structure and all contains the two columns GEO and
REF_DATE where the former one refers to the geographical
region and the second one refers to the year of data. Thus, it’s
possible to combine the two data sets to obtain all information
needed.
However, it is worth mentioning that both data sets are huge and contains unrelated information. Therefore, cleaning and wrangling are needed for more convenient analysis and more efficient computing & uploading, as the original data sets are oversize thus cannot be pushed to github repository.
Reference:
fn1 <- "https://raw.githubusercontent.com/inorrr/JSC370_project/main/census.csv"
fn2 <- "https://raw.githubusercontent.com/inorrr/JSC370_project/main/crime.csv"
if (!file.exists("census.csv"))
download.file(fn1, destfile = "census.csv")
census_df <- data.table::fread("census.csv")
if (!file.exists("crime.csv"))
download.file(fn2, destfile = "crime.csv")
crime_df <- data.table::fread("crime.csv")
crime_df <- crime_df[, c("REF_DATE", "GEO", "Violations", "Statistics", "VALUE", "UOM")]
census_df <- census_df[, c("REF_DATE", "GEO", "Age group", "Sex", "Income source", "Statistics", "VALUE", "UOM", "SCALAR_FACTOR")]
table(census_df$GEO)
provinces1 <- c("Alberta [48]", "British Columbia [59]", "Manitoba [46]", "New Brunswick [13]",
"Newfoundland and Labrador [10]", "Saskatchewan [47]",
"Nova Scotia [12]", "Ontario [35]",
"Prince Edward Island [11]", "Quebec [24]")
provinces2 <- c("Alberta", "British Columbia", "Manitoba", "New Brunswick",
"Newfoundland and Labrador", "Saskatchewan","Nova Scotia",
"Ontario", "Prince Edward Island", "Quebec")
crime_df <- crime_df[crime_df$GEO %in% provinces1, ]
census_df <- census_df[census_df$GEO %in% provinces2, ]
crime_df$GEO <- gsub("\\s*\\[\\d+\\]$", "", crime_df$GEO)
table(crime_df$GEO)
Actual incidents) and crime rate
(Rate per 100,000 population), thus statistics related to
charges are removed. The Crime Severity
Index(Percentage contribution to the Crime Severity Index (CSI))
seems to be interesting and thus is kept.crime_df <- crime_df %>% filter(Statistics == "Actual incidents" |
Statistics == "Rate per 100,000 population" |
Statistics == "Percentage contribution to the Crime Severity Index (CSI)")
print(length(unique(crime_df$Violations)))
table(crime_df$Violations)
# Identify rows that start with "Total"
total_rows <- grepl("^Total", crime_df$Violations)
# Subset the dataframe to keep only the rows starting with "Total"
crime_df <- crime_df[total_rows, , drop = FALSE]
# Remove square brackets and numbers at the end
crime_df$Violations <- gsub("\\s*\\[\\d+\\]$", "", crime_df$Violations)
the column Age group specifies the age however since
we do not have this information in the crime data frame, we need to
combine all age groups. This can be done by taking the average of the
categories.
Same for Sex, same method is used.
table(census_df$"Age group")
table(census_df$"Sex")
# first we merge the age group categories
census_df <- census_df %>%
group_by(REF_DATE, GEO, Sex, `Income source`, Statistics, UOM, SCALAR_FACTOR) %>%
summarise(VALUE = mean(VALUE, na.rm = TRUE))
# next we merge the age group categories
census_df <- census_df_new %>%
group_by(REF_DATE, GEO, `Income source`, Statistics, UOM, SCALAR_FACTOR) %>%
summarise(VALUE = mean(VALUE, na.rm = TRUE))
crime_df <- crime_df %>% filter(REF_DATE >= 1998 & REF_DATE <= 2021)
census_df <- census_df %>% filter(REF_DATE >= 1998 & REF_DATE <= 2021)
write.csv(crime_df, "/Users/yinuozhao/Desktop/UofT/JSC370/JSC370-2024-main/JSC370_project/crime.csv")
write.csv(census_df, "/Users/yinuozhao/Desktop/UofT/JSC370/JSC370-2024-main/JSC370_project/census.csv")
At this point both the crime data frame and census data frame has
REF_DATE and GEO in common, and they each have
another categorical variable, which is
Income source for census data and Violation(it
means crime type) for crime data. While it may seem to make sense to
join the two data sets using REF_DATE and GEO directly, the results
would involves the data for all combinations of Income source and
Violation for each REF_DATE and GEO. This will be a huge data set and
thus slow down the computation. Therefore, I choose to keep the
data sets separate and join them when
necessary (i.e. after picking out certain categories of
interest).
Notice that right now both data sets are in long format, I converted them to wide for convenience.
crime_df <- pivot_wider(crime_df, id_cols = c(REF_DATE, GEO, Violations),
names_from = Statistics, values_from = VALUE)
crime_df <- na.omit(crime_df)
census_df <- pivot_wider(census_df, id_cols = c(REF_DATE, GEO, `Income source`),
names_from = Statistics, values_from = VALUE)
census_df <- na.omit(census_df)
Check the dimensions and headers and footers of the data
dim(census_df)
dim(crime_df)
head(crime_df)
tail(crime_df)
head(census_df)
tail(census_df)
The census data set has 8 variables with 3613 observations, the crime data set has 6 variables with 9271 observations. By looking at the headers and footers of both data sets, they seems to be imported correctly and contains no missing values (in the displayed rows).
Check the variable types in the data
str(census_df)
str(crime_df)
summary(census_df)
summary(crime_df)
In both data frames, we see that the variable types are a mix of integer, numeric and characters. All variable types correctly align with the context of the variables. No major problems arises with the data at this stage (i.e. a variable with all missing values.)
Take a closer look at some/all of the variables
For both data frame, we need REF_DATE and
GEO to correctly identify a province in Canada with a valid
year. For census data frame, we need to look at the values of the
different types of income (median, aggregate, etc). For the crime data
frame, we need to look at the recorded crime rate and actual number of
incidents to be within the reasonable range.
table(census_df$REF_DATE)
table(census_df$GEO)
table(crime_df$REF_DATE)
table(crime_df$GEO)
summary(census_df$`Aggregate income`)
summary(census_df$`Average income (excluding zeros)`)
summary(census_df$`Median income (excluding zeros)`)
summary(crime_df$`Actual incidents`)
summary(crime_df$`Rate per 100,000 population`)
Both data sets contains data from 1998 to 2021, on the 10 provinces in Canada as desired because I cleaned the data sets this way. Other variables being checked are within the reasonable range. The aggregate income, average income and median income are all measured in 2021 constant dollars, aggregate income record numbers in millions. The crime rates are measured as number of incidents per 100,000 population.
Validate with an external source
Notice that the minimum average income is 677.8, which seems to be much lower than then mean average income, even 10 times lower than the 1st quantile. Since it seems quite suspicious, we need to validate it.
census_df[which.min(census_df$`Average income (excluding zeros)`), ]
This data is from Prince Edward Island in 2004, and the income source is “other government transfers”. Upon research, Government transfers refers to assistance from provincial and municipal programs, Workers’ Compensation benefits, the GST/HST Credit and provincial refundable tax credits such as the Quebec and Newfoundland and Labrador sales tax credits. However, since many of the above mentioned are made to their own category and excluded from “other government transfers” in the data set, it make sense that the value is low.
“Provide summary statistics in tabular from and publication-quality figures, take a look at the kable function from knitr to write nice tables in Rmarkdown.”
Firstly, I examined the trend in crime rate across provinces and crime types.
filtered_data = crime_df %>% filter(Violations=="Total, all violations")
unique_x_values <- unique(crime_df$REF_DATE)
ggplot(filtered_data, aes(x = REF_DATE, y = `Rate per 100,000 population`, color = GEO)) +
geom_line() +
labs(x = "Year", y = "Rate per 100,000 population", title = "Rates of Total Crime by Province") +
scale_x_continuous(breaks = unique_x_values) +
scale_color_discrete(name = "Provinces") +
theme_linedraw() +
theme(legend.position = "right") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(plot.title = element_text(face = "bold")) +
theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm")) +
theme(axis.text = element_text(size = 7)) +
theme(legend.title = element_text(face = "bold"))
The plot depicts the total crime rate, encompassing all
types of crimes, from 1998 to 2021, with each province represented by a
distinct color. The crime rate is measured per 100,000 population.
Notably, Saskatchewan consistently exhibits a significantly higher crime
rate compared to other provinces throughout the period of 1998 to 2021.
Conversely, Quebec and Ontario consistently demonstrate the lowest crime
rates. Across all provinces, there is a discernible decreasing
trend in crime rates over the years, with many provinces
experiencing peak crime rates in 2003-2004.
In addition to analyzing total crime, I delved into specific crime categories often associated with poverty: break and enter, robbery, and prostitution. The three accompanying plots illustrate their respective rates over the years. Overall, there is a decreasing trend in the rates of all three crimes, with occasional exceptions such as robbery rates in Manitoba. Notably, British Columbia stands out with a significantly high rate of prostitution in 2004, doubling the number reported in Saskatchewan, which held the second-highest rate that year.
Note that the 3 plots below shares the same legend with the above plots. Therefore the legend is omited for better display purpose. Codes are also not shown as they reuses the above code chunk.
Presented below is a table summarizing the average crime rate across
all crime types (excluding Total, all violations)
categorized by province and year. The cells are color-coded by value,
with lighter shades indicating higher values and darker shades
representing smaller values. Upon scrolling through the table, it
becomes evident that across all provinces, there is a discernible
decreasing trend in the average crime rate, as
evidenced by the darkening shades in each column. This observation
aligns with the findings and inferences drawn from the preceding
plots.
| Year | Alberta | British Columbia | Manitoba | New Brunswick | Newfoundland and Labrador | Nova Scotia | Ontario | Prince Edward Island | Quebec | Saskatchewan |
|---|---|---|---|---|---|---|---|---|---|---|
| 1998 | 990.7389 | 1310.2795 | 1140.4897 | 746.1692 | 644.1653 | 875.0903 | 776.1408 | 686.4670 | 702.7424 | 1420.691 |
| 1999 | 996.9833 | 1243.8727 | 1139.4081 | 751.8170 | 616.3186 | 890.2262 | 705.8405 | 786.3608 | 649.5519 | 1372.294 |
| 2000 | 938.1432 | 1202.5849 | 1146.4238 | 719.6546 | 621.9546 | 807.6322 | 695.1511 | 739.1711 | 653.3770 | 1428.722 |
| 2001 | 979.9516 | 1233.4349 | 1213.3189 | 712.1708 | 617.2849 | 812.8562 | 672.0454 | 752.6111 | 634.2719 | 1518.370 |
| 2002 | 994.3392 | 1245.2322 | 1202.5438 | 726.1322 | 639.0897 | 816.2289 | 651.8003 | 850.6072 | 617.0995 | 1505.492 |
| 2003 | 1065.0392 | 1317.9670 | 1366.9247 | 758.0414 | 657.4530 | 893.6070 | 632.2778 | 915.8641 | 638.6003 | 1728.543 |
| 2004 | 1098.5319 | 1315.9251 | 1347.9319 | 780.0538 | 667.5708 | 915.0119 | 595.3678 | 893.0314 | 624.6361 | 1684.091 |
| 2005 | 1079.1975 | 1260.0459 | 1278.7128 | 692.0257 | 650.8968 | 859.9862 | 574.3292 | 812.8373 | 615.6142 | 1649.431 |
| 2006 | 1007.2778 | 1203.3584 | 1260.0861 | 648.6084 | 667.1994 | 856.8851 | 589.4051 | 737.7965 | 609.9108 | 1502.769 |
| 2007 | 1018.9281 | 1137.0500 | 1226.7060 | 608.8173 | 697.5675 | 803.5808 | 560.8689 | 670.0778 | 584.9914 | 1480.692 |
| 2008 | 984.9581 | 1046.3411 | 1081.2483 | 624.8186 | 696.9058 | 772.9158 | 537.7905 | 683.8757 | 589.6462 | 1412.581 |
| 2009 | 935.6630 | 983.8919 | 1153.1356 | 615.0719 | 716.9997 | 773.6461 | 521.8932 | 697.9243 | 578.2262 | 1403.483 |
| 2010 | 884.1551 | 933.4835 | 1043.1349 | 625.0964 | 725.7876 | 763.8995 | 496.5708 | 699.9311 | 562.8811 | 1432.178 |
| 2011 | 812.9859 | 880.2127 | 1012.0920 | 577.9611 | 705.3294 | 714.0051 | 468.0995 | 710.5243 | 519.5841 | 1408.674 |
| 2012 | 819.5778 | 856.2392 | 1000.7560 | 615.4689 | 696.8877 | 714.1444 | 449.8354 | 721.6681 | 510.7627 | 1311.701 |
| 2013 | 778.9997 | 803.0581 | 866.5694 | 538.5750 | 682.2809 | 639.5953 | 408.5759 | 669.5086 | 460.8854 | 1218.580 |
| 2014 | 783.3216 | 792.6345 | 840.0633 | 482.3608 | 636.2366 | 587.6563 | 391.6511 | 510.4745 | 421.5565 | 1155.519 |
| 2015 | 855.7324 | 827.4308 | 894.7058 | 527.3759 | 635.6389 | 551.9057 | 391.9432 | 451.0126 | 399.5150 | 1258.409 |
| 2016 | 869.6887 | 818.0165 | 956.7842 | 505.9946 | 630.6538 | 538.1619 | 399.0192 | 468.5732 | 405.6246 | 1317.321 |
| 2017 | 927.8811 | 781.4886 | 955.9997 | 549.3957 | 583.0170 | 549.4824 | 417.5035 | 439.5063 | 410.9189 | 1271.176 |
| 2018 | 886.8923 | 718.8339 | 912.0272 | 497.1263 | 502.9756 | 493.3066 | 400.3605 | 447.9967 | 339.5035 | 1188.911 |
| 2019 | 925.4612 | 796.9984 | 1016.7941 | 544.1014 | 553.7074 | 498.5129 | 404.5249 | 518.0077 | 326.3725 | 1151.130 |
| 2020 | 788.6307 | 735.6162 | 943.0979 | 564.7014 | 557.8670 | 491.2514 | 356.3917 | 446.2160 | 301.0057 | 1085.383 |
| 2021 | 730.3198 | 703.6836 | 839.5900 | 590.1656 | 602.4644 | 489.7486 | 360.7433 | 418.9484 | 312.5548 | 1106.370 |
Given the general decrease in crime rates, I am interested in exploring the trend of income to ascertain the potential existence of an association.
The following plot illustrates the average total income of each province over time. It is evident that, on the whole, the average total income for all provinces exhibits a steady upward trend. Notably, since 2003, Alberta has surpassed Ontario to become the province with the highest average total income. Additionally, it is noteworthy to observe a slight decrease in income across all provinces around 2019, likely attributed to the impact of the COVID-19 pandemic.
The box plot below also shows the average total income for different years, but this time combining data from all provinces. The pink dots represent the actual values for each year in each province. It’s clear from the boxes that the average income is increasing year by year. Notably, between 2010 and 2016, there are some outliers with very high incomes, which corresponds to Alberta when compared with the previous plot.
After examining total income, I delved into specific income sources: employment income, investment income, and market income, which are major income categories. The plots below illustrate that all three types of income are increasing. However, employment and market income show a more steady growth pattern, while investment income fluctuates dramatically from year to year. (Same as before, the same legend is omitted for display purpose since I’m only interested in the overall trend, not how provinces compare to each other.)
The table below provides a summary of the average total income of all
provinces from 1998 to 2021. The color scale used is consistent with
that of the crime data frame, where lighter shades denote higher values.
Over time, there is a discernible increase in income, as evidenced by
the progressive lightening of colors.
| Year | Alberta | British Columbia | Manitoba | New Brunswick | Newfoundland and Labrador | Nova Scotia | Ontario | Prince Edward Island | Quebec | Saskatchewan |
|---|---|---|---|---|---|---|---|---|---|---|
| 1998 | 44804.17 | 40441.67 | 38208.33 | 34125.00 | 30316.67 | 34320.83 | 45100.00 | 32633.33 | 37412.50 | 37370.83 |
| 1999 | 44237.50 | 41416.67 | 38075.00 | 35112.50 | 31783.33 | 36316.67 | 46620.83 | 33058.33 | 38520.83 | 38000.00 |
| 2000 | 45795.83 | 41462.50 | 38841.67 | 35654.17 | 32533.33 | 37070.83 | 48170.83 | 34225.00 | 39716.67 | 38579.17 |
| 2001 | 47845.83 | 41916.67 | 39650.00 | 36537.50 | 32600.00 | 37950.00 | 48279.17 | 34445.83 | 40520.83 | 40308.33 |
| 2002 | 46983.33 | 42695.83 | 39629.17 | 35870.83 | 33270.83 | 38487.50 | 48091.67 | 35020.83 | 40791.67 | 40150.00 |
| 2003 | 47920.83 | 41787.50 | 40129.17 | 36108.33 | 33125.00 | 37716.67 | 47812.50 | 35325.00 | 40570.83 | 40554.17 |
| 2004 | 49666.67 | 42941.67 | 41029.17 | 36433.33 | 33470.83 | 38187.50 | 48170.83 | 36070.83 | 41600.00 | 40345.83 |
| 2005 | 51758.33 | 43800.00 | 41700.00 | 36400.00 | 35012.50 | 39283.33 | 48395.83 | 36800.00 | 40804.17 | 42166.67 |
| 2006 | 53820.83 | 44679.17 | 42562.50 | 37479.17 | 36858.33 | 40408.33 | 47258.33 | 38000.00 | 41604.17 | 44904.17 |
| 2007 | 56879.17 | 45850.00 | 44633.33 | 39200.00 | 39304.17 | 41320.83 | 48079.17 | 38120.83 | 42441.67 | 47750.00 |
| 2008 | 57337.50 | 46858.33 | 45875.00 | 39520.83 | 40412.50 | 40933.33 | 49212.50 | 39500.00 | 41925.00 | 49100.00 |
| 2009 | 57237.50 | 45875.00 | 45437.50 | 40466.67 | 40500.00 | 42362.50 | 48404.17 | 39587.50 | 42545.83 | 50650.00 |
| 2010 | 57650.00 | 45400.00 | 45129.17 | 40470.83 | 42195.83 | 41579.17 | 48891.67 | 39841.67 | 42662.50 | 50337.50 |
| 2011 | 59145.83 | 45633.33 | 44687.50 | 41508.33 | 44045.83 | 42516.67 | 48133.33 | 41354.17 | 43833.33 | 52483.33 |
| 2012 | 62166.67 | 46400.00 | 45195.83 | 41458.33 | 46112.50 | 43358.33 | 48645.83 | 40562.50 | 44308.33 | 52687.50 |
| 2013 | 61962.50 | 47854.17 | 46837.50 | 42095.83 | 48583.33 | 45220.83 | 49600.00 | 42529.17 | 44716.67 | 53658.33 |
| 2014 | 63200.00 | 47929.17 | 47129.17 | 42545.83 | 49654.17 | 45504.17 | 50062.50 | 42658.33 | 45041.67 | 55975.00 |
| 2015 | 64020.83 | 47437.50 | 48054.17 | 42104.17 | 50033.33 | 45445.83 | 51083.33 | 43308.33 | 44354.17 | 54908.33 |
| 2016 | 57637.50 | 47354.17 | 47425.00 | 43383.33 | 48466.67 | 45908.33 | 51191.67 | 43366.67 | 45983.33 | 52816.67 |
| 2017 | 59825.00 | 50650.00 | 49529.17 | 44858.33 | 48633.33 | 45612.50 | 52683.33 | 44245.83 | 46412.50 | 53900.00 |
| 2018 | 59458.33 | 51012.50 | 48875.00 | 46133.33 | 49154.17 | 46508.33 | 52650.00 | 44962.50 | 47091.67 | 52412.50 |
| 2019 | 58787.50 | 51791.67 | 48016.67 | 45779.17 | 48695.83 | 46437.50 | 52162.50 | 44616.67 | 48854.17 | 51304.17 |
| 2020 | 57991.67 | 54508.33 | 50758.33 | 48325.00 | 50183.33 | 48558.33 | 55320.83 | 47558.33 | 51000.00 | 53500.00 |
| 2021 | 59270.83 | 55733.33 | 50300.00 | 48358.33 | 51308.33 | 49062.50 | 56441.67 | 47737.50 | 52195.83 | 53025.00 |
After separately exploring the two data sets, there appears to be a potential relationship between crime rate and income. However, further experimentation using both data sets together is necessary to confirm and understand this relationship more comprehensively.
filtered_census <- census_df %>% filter(`Income source` == "Total income")
filtered_crime <- crime_df %>%filter(Violations=="Total, all violations")
joint_df <- inner_join(filtered_crime, filtered_census, by = c("REF_DATE", "GEO"))
ggplot(data = joint_df, aes(x = `Average income (excluding zeros)`, y = `Rate per 100,000 population`, color = GEO, size = 3)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE, size = 1.0) +
guides(size = FALSE) + # remove "size" from legend
labs(x = "Average Total Income (excluding zeros)", y = "Total Crime Rate per 100,000 population`", title = "Average Total Income and Total Crime Rate") +
theme_linedraw() +
scale_color_discrete(name = "Provinces") +
theme(axis.text = element_text(size = 7)) +
theme(plot.title = element_text(face = "bold")) +
theme(legend.title = element_text(face = "bold")) +
theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"))
The scatter plot presented demonstrates the relationship between average
total income and total crime rate (ignoring crime type). Although the
overall distribution of points may not reveal a strong relationship,
upon coloring the points by province, a distinct pattern emerges. It
becomes evident that average total income and total crime rate are
negatively correlated across all provinces, with
possibly the exception of Newfoundland and Labrador where the
relationship appears to be less pronounced, indicated by a line that is
nearly horizontal. To draw a confident conclusion, it is imperative to
examine the actual correlation value between these variables.
| Province | Correlation |
|---|---|
| Alberta | -0.7023852 |
| British Columbia | -0.8154907 |
| Manitoba | -0.7392303 |
| New Brunswick | -0.5966958 |
| Newfoundland and Labrador | -0.0025743 |
| Nova Scotia | -0.9236636 |
| Ontario | -0.7583329 |
| Prince Edward Island | -0.8082244 |
| Quebec | -0.9447648 |
| Saskatchewan | -0.7695228 |
From here, we can see that all correlation values are less than zero, which includes Newfoundland and Labrador which has a very week but negative correlation. Quebec exhibits the strongest correlation between average total income and total crime rate as the correlation is close to -1.
The three smaller plots investigate the relationship between a specific
income source and a particular type of violation, as indicated by the
legend. In the first plot, which examines the relationship between
robbery crime rate and employment income, all provinces
display a negative trend except for Newfoundland and Labrador. Manitoba
demonstrates a weak relationship, as evidenced by the considerable
dispersion of points around the line.
Moving to the second plot, which contrasts market income with property crime rate, a strong and negative relationship is apparent. The third plot, depicting employment income versus prostitution crime rate, reveals varying degrees of association across provinces, with many showing a weak relationship. Notably, Ontario exhibits a positive relationship between employment income and prostitution crime rate.
Since Ontario demonstrates a different relationship to other provinces in the previous plot, I choose it to take a closer look. The bar plot below shows the composition of crime incidents each year in Ontario. We see that the major categories are property crime and weapon violations.
| Violations | Canada Pension Plan (CPP) and Quebec Pension Plan (QPP) benefits | Child benefits | Employment Insurance (EI) benefits | Employment income | Government transfers | Investment income | Market income | Old Age Security (OAS) and Guaranteed Income Supplement (GIS) | Other government transfers | Other income | Retirement income | Self-employment income | Social assistance | Total income | Wages, salaries and commissions |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total Cannabis Act | 0.6399496 | 0.1543812 | 0.1797670 | 0.0753819 | 0.4470022 | 0.8532210 | 0.1175474 | 0.5881777 | 0.4757809 | 0.3878745 | -0.3024679 | 0.7318504 | -0.0331746 | 0.3492095 | 0.0793002 |
| Total Criminal Code traffic violations | -0.7186518 | -0.7033720 | -0.3111971 | -0.8256471 | -0.6259757 | -0.7113270 | -0.8392439 | -0.7061295 | -0.4641019 | -0.6126601 | -0.7662505 | 0.4119972 | -0.2573561 | -0.8091403 | -0.8124589 |
| Total Federal Statute violations | -0.6119287 | -0.7838763 | -0.3673042 | -0.7543371 | -0.7932252 | -0.7381556 | -0.7753723 | -0.6782687 | -0.5159253 | -0.6914845 | -0.6938453 | 0.5508116 | -0.4218109 | -0.8030217 | -0.7323180 |
| Total Immigration and Refugee Protection Act | -0.8317978 | -0.7728983 | -0.4672506 | -0.6495341 | -0.6644657 | -0.7653287 | -0.7071820 | -0.8112917 | -0.6886685 | -0.4842990 | -0.7879141 | 0.7048638 | -0.7783455 | -0.7171724 | -0.6634925 |
| Total abduction | -0.8040213 | -0.7333317 | -0.4912742 | -0.8052992 | -0.6300930 | -0.6951519 | -0.8119927 | -0.7372497 | -0.5562309 | -0.6732663 | -0.7985790 | 0.4697212 | -0.4617595 | -0.7891757 | -0.8045523 |
| Total administration of justice violations | 0.2894308 | 0.5634451 | 0.0545464 | 0.5244904 | 0.3645871 | 0.3851740 | 0.5142525 | 0.4384029 | 0.2325288 | 0.5417441 | 0.3120033 | -0.3465106 | 0.2938743 | 0.4909069 | 0.5033958 |
| Total assaults against a peace officer | 0.5502388 | 0.6390806 | 0.5288927 | 0.6454176 | 0.6027397 | 0.4865436 | 0.6319311 | 0.5931247 | 0.7434956 | 0.6854612 | 0.4739395 | -0.2753145 | 0.2414854 | 0.6431781 | 0.6085258 |
| Total breaking and entering | -0.9225065 | -0.8356442 | -0.4924572 | -0.8626247 | -0.7424577 | -0.8514282 | -0.8942464 | -0.8359850 | -0.6090103 | -0.6813438 | -0.9195154 | 0.6695426 | -0.6051515 | -0.8816448 | -0.8744406 |
| Total cannabis, trafficking, production or distribution (pre-legalization) | -0.8042443 | -0.7928160 | -0.2177518 | -0.8090668 | -0.7933560 | -0.7772350 | -0.8522811 | -0.7282460 | -0.2614858 | -0.5575754 | -0.8674750 | 0.5659155 | -0.6541342 | -0.8558468 | -0.8056268 |
| Total cocaine, trafficking, production or distribution | 0.0080186 | -0.0443540 | 0.1815189 | 0.0002429 | -0.0576558 | -0.0816744 | 0.0032864 | -0.0408692 | 0.0252626 | 0.2132940 | -0.0442931 | 0.0848983 | -0.1543917 | -0.0123439 | -0.0202027 |
| Total distribution - Cannabis Act | 0.2532874 | 0.3697878 | 0.4186719 | 0.2564310 | 0.7846734 | 0.8875198 | 0.3338212 | 0.8638276 | 0.5252365 | 0.5014931 | -0.1813740 | 0.4965440 | -0.3997706 | 0.6914302 | 0.2530735 |
| Total drug violations | -0.3430541 | -0.5827351 | -0.2779386 | -0.5463706 | -0.6396933 | -0.5269176 | -0.5575712 | -0.4479062 | -0.3701146 | -0.5511335 | -0.4286892 | 0.3378455 | -0.2088664 | -0.5953362 | -0.5170650 |
| Total fail to stop or remain | -0.4034206 | -0.5030218 | -0.2803077 | -0.6386394 | -0.5070987 | -0.4955722 | -0.6375927 | -0.4195422 | -0.3208439 | -0.5573177 | -0.4472545 | 0.1577972 | 0.1608018 | -0.6228607 | -0.6104423 |
| Total firearms, use of, discharge, pointing | 0.4841041 | 0.7126655 | 0.3222550 | 0.7080560 | 0.7362454 | 0.6864169 | 0.7176776 | 0.5769022 | 0.5186607 | 0.6327615 | 0.5410733 | -0.4276664 | 0.1780762 | 0.7437977 | 0.6955827 |
| Total forcible confinement or kidnapping | -0.0720063 | 0.0112625 | 0.1985305 | 0.0983675 | -0.0452937 | -0.1057013 | 0.0631279 | -0.0020276 | 0.0240264 | 0.3024561 | -0.0950879 | 0.2446965 | -0.2214532 | 0.0370225 | 0.0638929 |
| Total impaired driving | -0.8557310 | -0.7818825 | -0.3409620 | -0.8188966 | -0.6783186 | -0.7761839 | -0.8456043 | -0.8438246 | -0.5720862 | -0.5889177 | -0.8976395 | 0.6189236 | -0.6575261 | -0.8275574 | -0.8216312 |
| Total importation and exportation - Cannabis Act | 0.6767907 | 0.1838120 | 0.2046896 | 0.1116378 | 0.4052830 | 0.8176372 | 0.1478339 | 0.5465302 | 0.4199046 | 0.4234147 | -0.3605503 | 0.7744642 | -0.0363785 | 0.3419987 | 0.1164391 |
| Total mischief | -0.9421800 | -0.8471372 | -0.3820440 | -0.8534749 | -0.7563984 | -0.8918294 | -0.8895470 | -0.8566592 | -0.6105450 | -0.5876269 | -0.9529915 | 0.7451827 | -0.7189232 | -0.8816163 | -0.8709267 |
| Total offences in relation to sexual services | 0.1489431 | 0.0213727 | -0.4663334 | 0.0241671 | -0.3243725 | -0.1657629 | 0.0339013 | 0.5777400 | 0.0346752 | -0.1125754 | 0.0358541 | -0.5972165 | 0.6151757 | -0.1268857 | 0.0749833 |
| Total other Controlled Drugs and Substances Act drugs, trafficking, production or distribution | 0.6495872 | 0.6406151 | 0.6388294 | 0.7081229 | 0.6977985 | 0.6213830 | 0.7108679 | 0.6006871 | 0.6900359 | 0.6663725 | 0.6454468 | -0.3011561 | 0.1776899 | 0.7286213 | 0.7085948 |
| Total other Criminal Code traffic violations | -0.3874440 | -0.4263617 | -0.1928496 | -0.5842861 | -0.3942984 | -0.4435046 | -0.5827052 | -0.3781473 | -0.2337219 | -0.4494342 | -0.4256197 | 0.1133769 | 0.1587207 | -0.5513948 | -0.5615534 |
| Total other Criminal Code violations | -0.4582604 | -0.1970822 | -0.2703928 | -0.3107493 | -0.2131728 | -0.3098238 | -0.3272181 | -0.2927299 | -0.2581102 | -0.1374758 | -0.4353647 | 0.1195234 | -0.1448995 | -0.3075077 | -0.3212000 |
| Total other Federal Statutes | -0.5031877 | -0.4959433 | -0.1536789 | -0.4251374 | -0.5025090 | -0.4955586 | -0.4627664 | -0.5004846 | -0.3190501 | -0.2850063 | -0.5950219 | 0.5122293 | -0.5006676 | -0.4867740 | -0.4229529 |
| Total other assaults | -0.7774742 | -0.8203756 | -0.3529050 | -0.8408717 | -0.7923596 | -0.8219512 | -0.8592494 | -0.7914887 | -0.5078342 | -0.5963271 | -0.8563039 | 0.6342825 | -0.5585913 | -0.8675191 | -0.8394781 |
| Total other violations | -0.6948372 | -0.6049911 | -0.3362882 | -0.7393387 | -0.4829557 | -0.6199522 | -0.7453292 | -0.6079443 | -0.4412007 | -0.5557644 | -0.6815783 | 0.3165638 | -0.2428936 | -0.6997612 | -0.7285573 |
| Total other violations causing death | -0.7013575 | -0.5561567 | -0.4128154 | -0.5330036 | -0.5441578 | -0.6001823 | -0.5702280 | -0.6020835 | -0.5816201 | -0.4667281 | -0.5995986 | 0.4220517 | -0.5101842 | -0.5804457 | -0.5177336 |
| Total other violent violations | 0.6964391 | 0.7813993 | 0.2698997 | 0.6861032 | 0.7566179 | 0.7006098 | 0.7205615 | 0.7710238 | 0.5678581 | 0.5599879 | 0.7078508 | -0.7407048 | 0.5774332 | 0.7512794 | 0.7146516 |
| Total possession - Cannabis Act | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| Total possession of stolen property | -0.8196812 | -0.7026968 | -0.2209310 | -0.7205022 | -0.6337884 | -0.8151877 | -0.7595498 | -0.6951892 | -0.4070943 | -0.4134656 | -0.8648735 | 0.7192309 | -0.6326056 | -0.7496622 | -0.7671210 |
| Total production - Cannabis Act | -0.7823286 | -0.3321634 | -0.2575272 | -0.4347831 | 0.8863796 | 0.8552884 | -0.3390936 | 0.9509705 | 0.8444271 | -0.5354741 | 0.9226652 | -0.9901246 | -0.1265695 | 0.3094884 | -0.4511187 |
| Total property crime violations | -0.8978577 | -0.7543288 | -0.4203821 | -0.7746335 | -0.6737176 | -0.8114778 | -0.8121705 | -0.7934432 | -0.5476628 | -0.5366632 | -0.8962551 | 0.6683060 | -0.6368281 | -0.8005713 | -0.7949070 |
| Total prostitution | -0.8489033 | -0.7559321 | -0.5186475 | -0.9180496 | -0.7359056 | -0.7157644 | -0.9652457 | -0.6890291 | -0.5180918 | -0.5513721 | -0.8177643 | 0.2252633 | -0.2280487 | -0.9558729 | -0.9036327 |
| Total robbery | -0.5709575 | -0.6007255 | -0.2973412 | -0.6191589 | -0.6429332 | -0.6640692 | -0.6398530 | -0.5423024 | -0.3066751 | -0.3943991 | -0.6745697 | 0.5084582 | -0.3464958 | -0.6596610 | -0.6230131 |
| Total sale - Cannabis Act | 0.8305721 | -0.1995422 | -0.1936661 | -0.2414690 | 0.0859602 | 0.7246261 | -0.2315929 | 0.2651273 | 0.4681703 | 0.0796278 | -0.2133301 | 0.7179241 | 0.3914490 | -0.0892274 | -0.2326984 |
| Total sexual violations against children | 0.8340545 | 0.9631761 | 0.4861737 | 0.9038366 | 0.8848393 | 0.8979768 | 0.9343577 | 0.8850667 | 0.6683254 | 0.7698027 | 0.8622806 | -0.7587430 | 0.6418588 | 0.9493437 | 0.9126578 |
| Total theft of motor vehicle | -0.8934673 | -0.7086737 | -0.3710112 | -0.6914095 | -0.6038020 | -0.7685217 | -0.7373869 | -0.7760287 | -0.5825151 | -0.4755032 | -0.8544672 | 0.6865498 | -0.7347073 | -0.7248222 | -0.7254220 |
| Total theft over $5,000 (non-motor vehicle) | -0.6491545 | -0.4225805 | -0.3014013 | -0.4782235 | -0.3342665 | -0.4653237 | -0.4965087 | -0.5338034 | -0.3649756 | -0.3222718 | -0.6086177 | 0.3643097 | -0.4855605 | -0.4693896 | -0.4890690 |
| Total theft under $5,000 (non-motor vehicle) | -0.8094329 | -0.7399273 | -0.4824113 | -0.7709340 | -0.7138874 | -0.7738704 | -0.8009891 | -0.7576249 | -0.5158623 | -0.5842126 | -0.8346564 | 0.5971284 | -0.4862269 | -0.8023110 | -0.7757068 |
| Total trafficking in stolen property | 0.8607122 | 0.8837019 | 0.3888328 | 0.8627692 | 0.8116025 | 0.9082499 | 0.9003260 | 0.8123087 | 0.5383496 | 0.6748250 | 0.9082349 | -0.7849822 | 0.6359333 | 0.9041812 | 0.8854195 |
| Total violent Criminal Code violations | -0.6680451 | -0.4037117 | -0.1343821 | -0.3982306 | -0.3211282 | -0.5137263 | -0.4445947 | -0.5525946 | -0.2576917 | -0.1123500 | -0.6785080 | 0.5155067 | -0.7056360 | -0.4259395 | -0.4327636 |
| Total weapons violations | -0.5488710 | -0.2894091 | -0.1254973 | -0.4402386 | -0.1719470 | -0.4010723 | -0.4336278 | -0.3638832 | -0.2474038 | -0.1334781 | -0.5087586 | 0.1434243 | -0.2617335 | -0.3789758 | -0.4719344 |
| Total, all Criminal Code violations (excluding traffic) | -0.8604485 | -0.6786744 | -0.3832652 | -0.7068606 | -0.6046292 | -0.7527368 | -0.7452086 | -0.7421601 | -0.5006376 | -0.4571058 | -0.8581022 | 0.6179758 | -0.6235580 | -0.7310706 | -0.7291286 |
| Total, all Criminal Code violations (including traffic) | -0.8634277 | -0.6877659 | -0.3841524 | -0.7209247 | -0.6126964 | -0.7594233 | -0.7584953 | -0.7489895 | -0.5046405 | -0.4700883 | -0.8635202 | 0.6149735 | -0.6127094 | -0.7434050 | -0.7417003 |
| Total, all violations | -0.8875784 | -0.7308819 | -0.4025536 | -0.7606958 | -0.6587512 | -0.7967699 | -0.7988669 | -0.7818161 | -0.5315050 | -0.5121003 | -0.8942667 | 0.6414355 | -0.6288462 | -0.7864432 | -0.7790928 |
| Total, possession, other Controlled Drugs and Substances Act drugs | 0.8702986 | 0.9371958 | 0.5100074 | 0.9443876 | 0.9041758 | 0.8971434 | 0.9676695 | 0.8962826 | 0.7001886 | 0.8034666 | 0.9170933 | -0.7102608 | 0.5627222 | 0.9800383 | 0.9418946 |
The table provides a summary of the correlation between all types of violations and income sources in Ontario. Notably, COVID-19 benefits are excluded from the analysis due to their availability only during the pandemic years, which limits the dataset. The strongest correlation is observed between self-employment income and production under the Cannabis Act. However, it’s important to note that this relationship may not be entirely reliable due to the limited data available (3 observations), as illustrated in the plot below.
The second strongest correlation, with a coefficient of 0.98, is observed between total income and incidents of possession of other Controlled Drugs and Substances Act drugs. With additional data available, this relationship appears more promising. Upon examination, it becomes evident that as income levels increase, the number of incidents of possession of these drugs also tends to rise. Despite the negative correlation between total income and total crime rate in Ontario, as found in previous analyses, a positive relationship exists between total income and incidents of possession of other Controlled Drugs and Substances Act drugs.
I created 4 levels for average total income using the quarantines, from negative infinity to the first quantile is “Low”, from first quantile to mean is “Med_Low”, from mean to 3rd quantile is “Med_High”, from 3rd quantile above is “High”.
filter_census <- census_df %>% filter(`Income source` == "Total income")
# summary(filter_census$`Average income (excluding zeros)`)
breaks <- c(-Inf, 40460, 45077, 48859, Inf)
filter_census$income_level <- cut(filter_census$`Average income (excluding zeros)`,
breaks = breaks, labels = c("Low", "Med-Low", "Med-High", "High"))
From the box plot presented below, it appears that the total crime rate does not exhibit a clear trend of decreasing with higher levels of total income. This finding contradicts previous observations when examining the relationship between total crime rate and average total income by province. Therefore, it is conceivable that while a relationship exists, it may be influenced by other factors related to the demographics of each province. Consequently, when considering all observations collectively, the relationship becomes less apparent.
Since we’ve previously observed in plots such that the slope of the relationship between average total income and total crime rate is different across provinces, we need to fit a model with interaction terms such that the slopes can be different.
library(broom)
model <- lm(data = joint_df, `Rate per 100,000 population` ~ `Average income (excluding zeros)` * `GEO`)
tidy_coeffs <- tidy(model)
table <- tidy_coeffs %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", full_width = TRUE)
table
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 14793.6650192 | 1437.5519874 | 10.2908731 | 0.0000000 |
Average income (excluding zeros)
|
-0.0965352 | 0.0258743 | -3.7309253 | 0.0002428 |
| GEOBritish Columbia | 13870.9677393 | 2312.3835770 | 5.9985583 | 0.0000000 |
| GEOManitoba | 7946.1906789 | 2296.0365821 | 3.4608293 | 0.0006468 |
| GEONew Brunswick | -2943.0958277 | 2103.8238576 | -1.3989269 | 0.1632431 |
| GEONewfoundland and Labrador | -8091.9410527 | 1705.7594792 | -4.7438934 | 0.0000038 |
| GEONova Scotia | 5230.7520879 | 2193.8216748 | 2.3843105 | 0.0179604 |
| GEOOntario | 6850.6041303 | 3311.9257794 | 2.0684655 | 0.0397653 |
| GEOPrince Edward Island | 1789.8353436 | 2029.0026916 | 0.8821257 | 0.3786716 |
| GEOQuebec | 1869.9125139 | 2382.7985390 | 0.7847548 | 0.4334416 |
| GEOSaskatchewan | 7458.4753196 | 1887.5177478 | 3.9514729 | 0.0001047 |
Average income (excluding zeros):GEOBritish Columbia
|
-0.2946092 | 0.0467722 | -6.2988076 | 0.0000000 |
Average income (excluding zeros):GEOManitoba
|
-0.1693983 | 0.0477122 | -3.5504187 | 0.0004703 |
Average income (excluding zeros):GEONew Brunswick
|
-0.0316576 | 0.0458147 | -0.6909929 | 0.4902984 |
Average income (excluding zeros):GEONewfoundland and
Labrador
|
0.0963984 | 0.0338286 | 2.8496116 | 0.0047929 |
Average income (excluding zeros):GEONova Scotia
|
-0.2025528 | 0.0469679 | -4.3125776 | 0.0000244 |
Average income (excluding zeros):GEOOntario
|
-0.2319253 | 0.0654035 | -3.5460653 | 0.0004777 |
Average income (excluding zeros):GEOPrince Edward Island
|
-0.1471245 | 0.0441284 | -3.3340081 | 0.0010044 |
Average income (excluding zeros):GEOQuebec
|
-0.1635189 | 0.0506113 | -3.2308755 | 0.0014233 |
Average income (excluding zeros):GEOSaskatchewan
|
-0.0726509 | 0.0362467 | -2.0043456 | 0.0462590 |
The linear regression model examines the relationship between
Crime Rate per 100,000 population and Average
income (excluding zeros) while considering the categorical
variable GEO representing different
provinces. The model reveals several significant
findings: firstly, a negative relationship exists
between average total income and total crime rate, suggesting that
higher average income tends to be associated with lower crime rates.
Secondly, various provinces exhibit differing baseline rates, with
British Columbia notably displaying a significantly
higher rate compared to the reference province. Additionally,
interaction terms between income and provinces indicate varying effects
across regions, such as a stronger negative association between income
and crime rate in British Columbia. Overall, the model, with an adjusted
R-squared value of 0.9328, indicates a robust fit,
suggesting that both income and province significantly influence the
rate per 100,000 population, with nuanced variations across different
regions.
“What you found so far from your data in terms of the formulated question.”
Recall that the question of interest is: “How does crime rate relate to income in Canada?”